Skip to content

Conversation

@martijnvg
Copy link
Member

Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.

@martijnvg martijnvg force-pushed the synthetic_source_ip_native_arrays branch from efc2fc8 to 87e8f8d Compare February 21, 2025 09:39
Follow up of elastic#113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.
@martijnvg martijnvg force-pushed the synthetic_source_ip_native_arrays branch from 87e8f8d to 62a3b50 Compare February 21, 2025 10:46
@martijnvg martijnvg added >enhancement v8.19.0 :StorageEngine/Mapping The storage related side of mappings labels Feb 21, 2025
@elasticsearchmachine
Copy link
Collaborator

Hi @martijnvg, I've created a changelog YAML for you.

@martijnvg martijnvg marked this pull request as ready for review February 21, 2025 11:33
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-storage-engine (Team:StorageEngine)

@martijnvg martijnvg requested review from kkrik-es and lkts February 21, 2025 15:58
if (address != null) {
indexValue(context, address);
}
if (offsetsFieldName != null && context.isImmediateParentAnArray() && context.getRecordedSource() == false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be using context.canAddIgnoredField instead of context.getRecordedSource. We had this comment in the previous PR, let's update KeywordFieldMapper too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 2a72a77

throw e;
}
}
if (address != null) {
Copy link
Contributor

@kkrik-es kkrik-es Feb 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is address set to null, ever? It's set to nullValue or value, iiuc.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nullValue is null by default, so address can be set to null.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But then, should we be comparing to nullValue here and below?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point: 77c30b1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that change fails the build. If we only check nullValue then also value needs to be checked...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed another commit


long ord = ords[offset];
BytesRef c = valueDocValues.lookupOrd(ord);
// This is keyword specific and needs to be updated once support is added for other field types:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need a comment, as the value is written as utf8. Maybe something about what the converter is expected to return, in the constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done: 86a1aa0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we enforce utf8 if this goes to xcontentbuilder anyway that has a generic value method that takes anything?


@Override
protected String randomSyntheticSourceKeep() {
return "all";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The testSyntheticSourceKeepArrays(...) unit test assumes ignored source and therefor assumes nested leaf arrays are maintained, which doesn't any more with the new offset encoding.
This was required for KeywordFieldMapperTests too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we update the test to check the new behavior as an alternative, if keep is arrays?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually want to merge OffsetDocValuesLoaderTestCase into MapperTestCase and then update this test to test the new behaviour. Let's do this in a follow up?

Copy link
Contributor

@kkrik-es kkrik-es left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactoring.

Copy link
Contributor

@lkts lkts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding ip support to data generation tests :)

@martijnvg
Copy link
Member Author

Consider adding ip support to data generation tests :)

I will do that in a follow up.

@martijnvg martijnvg added the auto-backport Automatically create backport pull requests when merged label Feb 25, 2025
@martijnvg martijnvg enabled auto-merge (squash) February 25, 2025 12:30
@martijnvg martijnvg merged commit 6c55099 into elastic:main Feb 25, 2025
17 checks passed
@elasticsearchmachine
Copy link
Collaborator

💔 Backport failed

Status Branch Result
8.x Commit could not be cherrypicked due to conflicts

You can use sqren/backport to manually backport by running backport --upstream elastic/elasticsearch --pr 122999

martijnvg added a commit to martijnvg/elasticsearch that referenced this pull request Feb 25, 2025
Backporting elastic#122999 to 8.x branch.

Follow up of elastic#113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.
elasticsearchmachine pushed a commit that referenced this pull request Feb 25, 2025
…ce (#123405)

* [8.x] Store arrays offsets for ip fields natively with synthetic source

Backporting #122999 to 8.x branch.

Follow up of #113757 and adds support to natively store array offsets for ip fields instead of falling back to ignored source.

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <[email protected]>
jordan-powers added a commit that referenced this pull request Mar 20, 2025
…4594)

This patch builds on the work in #122999 and #113757 to natively store
array offsets for numeric fields instead of falling back to ignored source
when `source_keep_mode: arrays`.
smalyshev pushed a commit to smalyshev/elasticsearch that referenced this pull request Mar 21, 2025
…stic#124594)

This patch builds on the work in elastic#122999 and elastic#113757 to natively store
array offsets for numeric fields instead of falling back to ignored source
when `source_keep_mode: arrays`.
elasticsearchmachine pushed a commit that referenced this pull request Mar 21, 2025
#124594) | Fix ignores malformed testcase (#125337) | Fix offsets not recording duplicate values (#125354) (#125440)

* Natively store synthetic source array offsets for numeric fields (#124594)

This patch builds on the work in #122999 and #113757 to natively store
array offsets for numeric fields instead of falling back to ignored source
when `source_keep_mode: arrays`.

(cherry picked from commit 376abfe)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
#	server/src/main/java/org/elasticsearch/index/mapper/NumberFieldMapper.java

* Fix ignores malformed testcase (#125337)

Fix and unmute testSynthesizeArrayRandomIgnoresMalformed

(cherry picked from commit 2ff03ac)

# Conflicts:
#	muted-tests.yml

* Fix offsets not recording duplicate values (#125354)

Previously, when calculating the offsets, we just compared the values as-is
without any loss of precision. However, when the values were saved into doc
values and loaded in the doc values loader, they could have lost precision.
This meant that values that were not duplicates when calculating the
offsets could now be duplicates in the doc values loader. This interfered
with the de-duplication logic, causing incorrect values to be returned.

My solution is to apply the precision loss before calculating the offsets,
so that both the offsets calculation and the SortedNumericDocValues
de-duplication see the same values as duplicates.

(cherry picked from commit db73175)
jordan-powers added a commit that referenced this pull request Mar 25, 2025
#125529)

This patch builds on the work in #113757, #122999, and #124594 to natively
store array offsets for boolean fields instead of falling back to ignored
source when `synthetic_source_keep: arrays`.
elasticsearchmachine pushed a commit that referenced this pull request Mar 25, 2025
#125529) (#125596)

This patch builds on the work in #113757, #122999, and #124594 to natively
store array offsets for boolean fields instead of falling back to ignored
source when `synthetic_source_keep: arrays`.

(cherry picked from commit af1f145)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
#	server/src/main/java/org/elasticsearch/index/mapper/BooleanFieldMapper.java
jordan-powers added a commit that referenced this pull request Mar 26, 2025
… source (#125709)

This patch builds on the work in #113757, #122999, #124594, and #125529 to
natively store array offsets for unsigned long fields instead of falling
back to ignored source when synthetic_source_keep: arrays.
elasticsearchmachine pushed a commit that referenced this pull request Mar 27, 2025
… source (#125709) (#125746)

This patch builds on the work in #113757, #122999, #124594, and #125529 to
natively store array offsets for unsigned long fields instead of falling
back to ignored source when synthetic_source_keep: arrays.

(cherry picked from commit 689eaf2)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
#	x-pack/plugin/mapper-unsigned-long/src/main/java/org/elasticsearch/xpack/unsignedlong/UnsignedLongFieldMapper.java
jordan-powers added a commit that referenced this pull request Mar 28, 2025
…source (#125793)

This patch builds on the work in #113757, #122999, #124594, #125529, and 
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…stic#124594)

This patch builds on the work in elastic#122999 and elastic#113757 to natively store
array offsets for numeric fields instead of falling back to ignored source
when `source_keep_mode: arrays`.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
elastic#125529)

This patch builds on the work in elastic#113757, elastic#122999, and elastic#124594 to natively
store array offsets for boolean fields instead of falling back to ignored
source when `synthetic_source_keep: arrays`.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
… source (elastic#125709)

This patch builds on the work in elastic#113757, elastic#122999, elastic#124594, and elastic#125529 to
natively store array offsets for unsigned long fields instead of falling
back to ignored source when synthetic_source_keep: arrays.
omricohenn pushed a commit to omricohenn/elasticsearch that referenced this pull request Mar 28, 2025
…source (elastic#125793)

This patch builds on the work in elastic#113757, elastic#122999, elastic#124594, elastic#125529, and 
elastic#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.
elasticsearchmachine pushed a commit that referenced this pull request Mar 28, 2025
…source (#125793) (#125891)

This patch builds on the work in #113757, #122999, #124594, #125529, and
#125709 to natively store array offsets for scaled float fields instead of
falling back to ignored source when synthetic_source_keep: arrays.

(cherry picked from commit 71e74bd)

# Conflicts:
#	server/src/main/java/org/elasticsearch/index/IndexVersions.java
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto-backport Automatically create backport pull requests when merged >enhancement :StorageEngine/Mapping The storage related side of mappings Team:StorageEngine v8.19.0 v9.1.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants